perm filename RESULT[DIS,DBL]4 blob sn#216643 filedate 1976-05-23 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00011 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	.NSECP(Results)
C00004 00003	.SSEC(What AM Did)
C00006 00004	. SSSEC(AM as a Mathematician)
C00015 00005	. SSSEC(AM as a Computer Program)
C00019 00006	.SSEC(Experiments with AM)
C00029 00007	. SSSEC(Must the Worth numbers be finely tuned?)
C00038 00008	. SSSEC(How finely tuned is the Agenda?)
C00047 00009	. SSSEC(What if certain key concepts are eliminated?)
C00049 00010	. SSSEC(What if certain heuristics are tampered with?)
C00050 00011	. SSSEC(Can AM work in a new domainα: Plane Geometry?)
C00052 ENDMK
C⊗;
.NSECP(Results)

.R1PAGE: PAGE;

This chapter opens  by summarizing what  AM "did". Section 1  gives a
fairly high-level description of the major paths which were explored,
the concepts discovered along  the way, the relationships which  were
noticed, and ones which "should" have been but but weren't.

The next section continues this  exposition by presenting the results
of experiments which were done with (and ⊗4on⊗*) AM.

.ONCE TURN ON "{}"

Chapter {[2] EVALU} will draw upon these results -- and others given
in the appendices -- to form conclusions about AM. Several meta-level
questions will be tackled there (e.g., What are AM's limitations?).

.SSEC(What AM Did)

AM is both a mathematician of sorts, and a big computer program.

.BEGIN TURN ON "{}"

By granting  AM more anthropomorphic  qualities than it  deserves, we
can  describe  its  progress  through  elementary  mathematics.    It
rediscovered many  well-known  concepts,  a  couple  interesting  but
not-generally-known ones,  and several  concepts which  were hitherto
unknown    and   should    have   stayed    that   way.       Section
{SECNUM}.{SSECNUM}.1 recaps what  AM did, much  as a historian  might
critically evaluate Euler's work.


By  under-estimating AM's sophistication,  one can demand  answers to
the typical questions to ask about a computer program: how big is it,
how much cpu time does it use, what is  the input/output, etc.  These
are treated in Section {SECNUM}.{SSECNUM}.2.

.END

. SSSEC(AM as a Mathematician)

Let's take a moment to discuss the  totality of the mathematics which
AM carried  out.  Like a contemporary  historian summarizing the work
of Euclid, we shall not hesitate to use current terms,  and criticize
by current standards.

AM  began  its   investigations  with  scanty  knowledge   of  a  few
set-theoretic  concepts  (sets,  equality of  sets,  set operations).
Most of the obvious set-theory relationships (e.g., de Morgan's laws)
were eventually  uncovered. Since AM never  fully understood abstract
algebra, the statement of each  of these was  quite
obscure. Since AM never understood formal proof, their verification
was in general merely empirical.

No explicit conception of infinity was ever derived, but AM  did
create  a procedure for  making arbitrarily  long chains of  new sets
("insert a  set  into  itself").    On the  other  hand,  AM  naively
established conjectures like  "a set has larger cardinality than any 
of its proper subsets",
showing  it had no real comprehension of inifinite sets.  No
sophisticated set theory  (e.g., diagonalization) was  ever
done.

After this initial period  of exploration, AM decided that "equality"
was  worth  generalizing,   and  thereby   discovered  the   relation
"same-size-as".  The naturl number system 
was based on this,  and soon most simple
arithmetic  operations were defined.  Addition arose as  an analog to
union. Multiplication arose  in four separate ways:  as an analog  to
cross-product, as repeated addition,  as iterated substitution$$ Take
two sets  A and B. Replace each element of A by the set B. Remove one
level of  parentheses by  taking the  union  of all  elements of  the
transfigured set  A. Then that new  set will have as  many elements as
the product  of the  lengths of  the  two original  sets. $,  and  by
studying the cardinality of  power sets$$ The size of the  set of all
subsets of  S is 2↑S.  Thus the power set  of A∪B has length equal to
the ↓_product_↓  of  the  lengths  of  the power  sets  of  A  and  B
individually (assuming A  and B are disjoint). $.   So multiplication
immediately captured AM's attention as an interesting concept.

Soon  after defining  multiplication, AM investigated  the process of
multiplying a number by itself: squaring.  The inverse of this turned
out to be interesting, and led to the definition of square-root.

Although AM was very  close to discovering irrationals at this point,
it turned aside  and was  content to  work with  integer-square-root.
Raising to fourth-powers, and fourth-rooting, were discovered at this
time.

.ONCE TURN ON "{}"

AM  also  worked  on  defining  a  meaningful inverse  operation  for
multiplication. This  led to  both division  and to  factoring.   The
associativity and commutativity of  multiplication indicated that it could
accept  a BAG (a multiset)  of numbers as  its argument, so factoring
was taken to mean finding all bags of numbers  whose product equalled
the given number.  Minimally-factorable numbers turned out to be what
we call primes.  Maximally-factorable numbers were also thought to be
interesting at  the time,  and in fact  an unusual  $$ These are  the
so-called  "highly-composite" numbers  of Ramanujan.   See 
Appendix {[2]MAXDIV}.
$ characterization of such numbers was discovered.

AM   conjectured  the  fundamental  theorem   of  arithmetic  (unique
factorization into  primes)  and Goldbach's  conjecture  (every  even
number >2 is the sum of  two primes) in a surprisingly symmetric way.
The unary representation of numbers gave way to a representation as a
bag of primes (based  on unique factorization), but AM  never thought
of  exponential notation.    Since the  key concepts  of  modulus and
exponentiation were never developed very far, progress in elementary
number theory  was
arrested.

When a  new base of  ⊗4geometric⊗* concepts was  added, AM  began finding
some  more general associations.  In place of  the strict definitions
for  the  equality  of   lines,  angles,  and  triangles,  came   new
definitions  of  concepts we  refer  to  as Parallel,  Equal-measure,
Similar,  Congruent, Translation,  Rotation, plus many  which have no
common name (e.g. the relationship of two triangles  sharing a common
angle).  A cute geometric interpretation of Goldbach's conjecture was
found$$  Given   all   angles  of   a   prime  number   of   degrees,
(0,1,2,3,5,7,11,...,179 degrees),  then any angle  between 0  and 180
degrees can be approximated (to within 1 degree) as the sum of two of
those angles. If our culture and our technology were  different, this
result  might have  been a  well-known one.  $.   Lacking a  geometry
"model" (an analogic representation like the one Gelernter employed),
AM  was  doomed  to  failure  with  respect  to  proposing  geometric
conjectures.

Similar restrictions due to poor "visualization" abilities would crop
up in topology.   The concepts of  continuity, infinity, and  measure
would have  to be  fed to  AM before  it could enter  the domains  of
analysis. More and  more drastic changes in its initial base would be
required, as the desired domain gets further and further  from simple
set theory.

. SSSEC(AM as a Computer Program)

<<Is this section worth including? Change of format, perhaps?>

When viewed as a large LISP program, there is very little of interest about
AM. There are the usual battery of customized functions (e.g., a conditional
PRINT function), the storage hacks (special emergency garbage collection
routines, which know which facets are expendible), the time hacks
(omniciently arrange clauses in a conjunction so that the one most likely
to fail will come first), and the bugs (if the user renames a concept
while it's the current one being worked on, 
there is a 5% chance of AM entering an infinite loop).

Below are listed a few parameters of the system, although I doubt that
they hold any theoretical significance. The reader may be curious about how
big AM, how long it takes, etc.



Machine: SUMEX, PDP-10, KI-10 uniprocessor, 256k core memory.

Language: Interlisp, January '75 release, 
which occupies 140k, but which provides a surplus  "shadow space"
of 256k additional words available for holding compiled code.

AM support code: 
200 compiled (not block-compiled) utility routines, control routines, etc.
Occupy roughly 100k, but all are pushed into the shadow space.

AM itself: 115 concepts, each occupying about 1k. Facet/entries stored as
property/value on the property list of atoms whose names are concepts' names.

Snazzy feature:
Executable entries on facets  (e.g., an entry on Union.Alg) are stored uncompiled
until the first time they are actually called on, at which time they are compiled
and then executed.

"Final" state of AM: 300 concepts, each occupying about 1k. Many are swapped out
onto disk. Original 115 concepts have grown to an average size of 2k.

Cpu time (and number of tasks chosen) at the moment various discoveries were made:

.BEGIN FILL PREFACE 0 INDENT 10 TURN ON "\" TABS 30,45

Cardinality:\15 minutes;\40 tasks

Unique factorization:\25 minutes;\60 tasks

<Maybe give a few more, intermediate ones>

.E

<<Other details about AM as a computer pgm>

.SKIP 1

.SSEC(Experiments with AM)

.EXPT: SECNUM;

.EXPTSSEC: SSECNUM;

.EXPTPAGE: PAGE;

Now we've  described the activities  AM carried  out during its  best
run.   AM was working by itself, and each  time executed the top task
on the  agenda.   It  received no  help  from the  user, and  all its
concepts' Intuitions facets had been removed.

One valuable aspect of  AM is  that it  is amenable  to many  kind of
interesting experiments.  Although AM is too ⊗4ad hoc⊗* for numerical
results to have much significance, the qualitative results perhaps do
have  some   valid  things  to  say  about   research  in  elementary
mathematics, about  automating  research,  and  at  least  about  the
efficacy of various parts of AM's design.

This section will explain  what it means to perform  an experiment on
AM,  what kinds  of experiments  are imaginable,  which of  those are
feasible, and  finally will  describe the  5  experiments which  were
performed on AM.

By modifying AM in various ways, its behavior can be altered, and the
⊗4quality⊗*  of  its  behavior  will change  as  well.  As  a drastic
example, if all the heuristics were removed, then AM  would just "sit
there"  -- with  no  plausible generators,  its  agenda would  remain
empty.

By careful planning, each experiment can tell us something new  about
AM: how  valuable a  certain piece  of it  is, how  robust a  certain
scheme  really is, etc.   The resuts of these  experiments would then
have  something  to   contribute  to  a   discussion  of  the   "real
intelligence"  of  AM  (e.g,  what features  were  superfluous),  and
contribute to  the design of the "next" AM-like system.  Generalizing
from those  results,  one might  suggest  some hypotheses  about  the
larger task of automated math research.

Let's cover the different  ⊗4kinds⊗* of experiments one could perform
on AM:

(i) Remove individual  concept modules,  and/or individual  heuristic
rules. Then  examine how  AM's performance  is degraded.   AM  should
operate even  if most of its heuristic rules  and most of its concept
modules excised, so  long as ⊗4any⊗*  facets of  any of the  concepts
still exist.  If the remaining  fragment of AM is too small, however,
it may  not be able to find anything interesting to do. In fact, this
situation was actually encountered experimentally, when the first few
partially complete concepts were inserted. If only a little bit of AM
is removed, the remainder  will in fact  keep operating without  this
"uninteresting collapse".   The converse situation should  also hold:
although  still functional  with any  concept module  unplugged, AM's
performance ⊗4should⊗*  be  noticably degraded.  That is,  while  not
indispensible, each concept should nontrivially help the others.  The
same  holds for each individual heuristic  rule.  Which concepts does
AM now  "miss" discovering?  Is the  removed concept/heuristic  later
discovered  anyway  by those  which  are left  in  AM?   This  should
indicate the importance  of each kind  of concept and  rule which  AM
starts with.

(ii) Vary  the relative  weights given  to features  by the  criteria
which  judge aesthetics, interestingness,  worth, utility,  etc.  See
how important each factor is in directing AM along successful routes.
In other  words, vary the  little numbers  in the formulae  (both the
global  formulae and  the local  ones inside  heuristic rules).   One
important result will be  some idea of the robustness  or "toughness"
of the numeric weighting  factors. If the system easily collapses, it
was too finely tuned to begin with.

(iii) Add several new concept modules (including new heuristics)  and
see if AM can  work in some unanticipated field  of mathematics (like
graph theory or calculus or plane geometry).  Do earlier acheivements
-- new  concepts and  relationships involving  concepts  -- have  any
impact in  the new domain?  Are some specialized heuristics  from the
first  domain totally wrong  here? Do all the  old general heuristics
still  hold  here?  Are  they  sufficient,  or   are  some  "general"
heuristics  needed here which  weren't needed  before? Does  AM "slow
down" as more and more concepts get introduced?

(iv) Add several new intuitions, plus a few new concepts, and see  if
AM can develop nonmathematical theories  (like elementary physics, or program
verification).  This  would  also require  limiting  AM's  freedom to
"ignore  a  given  body  of  data  and  move  on  to  something  more
interesting". The exploration of  very non-formalizable fields (e.g.,
politics) will  of course require much more than a small modification
to AM.

(v) Add several  new concepts dealing with  proof, and of course  add
all the associated heuristic rules. Such rules would advise AM on the
fine points of  using various techniques  of proof/disproof: when  to
use them, what to try next based on why the last attempt failed, etc.
See if the ⊗4kinds⊗* of discoveries AM makes are increased.


During  the past few weeks,  several experiments (of  types i,ii, and
iii above$$  experiments  of type  (iv)  weren't tried  because  AM's
intuition  scheme  was  a  failure   as  it  was  initally  designed.
Experiemnt (v)  will probably occupy the summer of 1976. $) have been
set up and performed on  AM. We're now ready to examine each  of them
in detail.  The following points are covered for each experiment:

.BN

λλ How was it thought of? Why did it come to mind?

λλ What will be gained by  it?  What would be the implications of the
various possible outcomes?

λλ How was the experiment set up? What preparations/modifications had
to be made? How much time (man-hours) did it take?

λλ What happened?  How did AM's  behavior change? Was  this expected?
Analysis.

λλ  What was learned from  this experiment? Can  we conclude anything
which suggests new  experiments (e.g.,  use a better  machine, a  new
domain) or which bears on a  more general problem that AM faced (e.g.,
a new way to teach math, a new idea about doing math research)?

.E

. SSSEC(Must the Worth numbers be finely tuned?)

.EX3PAGE: PAGE;

Each of the 115 initial concepts 
has  supplied to it (by  the author) a number  between 0
and 1000,  stored as its Worth facet,  which is supposed to represent
the overall  value of  the concept.  "Compose" has  a higher  initial
Worth than "Structure-delete", which is  higher than "Equality"$$ As AM
progresses, it notices something interesting about Equality every now
and then, and pushes its  Worth value upwards.   $.

Frequently, the priority of a task involving C depends on the overall
Worth of C. 
How  sensitive is
AM's  behavior to  the initial  settings of  the Worth  facets?   How
finely tuned must these intial Worth values be?

To test this, a simple experiment was performed. Just before starting
AM, the mean value of all concepts' Worth values was computed (around
200). Then  each concept had its Worth reset to  the value 200.  This
was done "by hand", by  the author, in a  matter of seconds.  AM  was
then started and run as if  there were nothing amiss, and its behavior
was watched carefully.

What happened?  By and large, the same major discoveries were made --
and missed -- as usual, in  the same order as usual.  But  whereas AM
proceeded fairly smoothly before, with little superflous activity, it
now wandered quite blindly  for long periods  of time, especially  at
the  very beginning.  Once  AM "hooked  into"  a line  of  productive
development,  it  followed  it  just  as  always, with  no  noticable
additional wanderings.  As one of  these lines  of developments  died
out, AM would wander around again, until the next one was begun.

It took roughly three times as long for each major discovery to occur
as  normal.  This "delay"  got shorter  and  shorter as  AM developed
further.   In  each  case,  the  tasks preceding  the  discovery  and
following  it were pretty  much the  same as  normal; only  the tasks
"between" two periods of development were different -- and much  more
numerous.  The  precise  numbers  involved  would  probably  be  more
misleading than  helpful, so they will not  be given$$ Any reader who
wishes to  perform  this experiment  can  simply say  [MAPC  CONCEPTS
'(LAMBDA  (c) (SETB  c WORTH  200] to  Interlisp, just  before typing
(START) to begin AM. $.

The  reader may be interested to learn  that the Worth values of many
of the  concepts -- and  most of the  new concepts  -- ended up  very
close  to the same  values that  they acheived  in the  original run.
Overrated concepts were  investigated and  proved boring;  underrated
concepts had  to wait longer  for their  chances, but then quickly  proved
interesting and had their Worth facets boosted.

The conclusion I  draw from this change in behavior is that the Worth
facets are useful for making blind decisions -- where AM  must choose
based  only on  the overall  worths of  the various  concepts in  its
repertoire.  Whenever  a specific  reason  existed, it  was  far more
influential than  the "erroneous"  Worth values.   The close,  blind,
random decisions occur  between long bursts of specific-reason-driven
periods of creative work.

The general answer, then, is ⊗4No⊗*, the initial settings of the Worth
values are not crucial.

Albeit at a  nontrival cost, the  Worth facets did manage  to correct
themselves  by the  end of  a long$$  A couple  cpu hours,  about a
thousand tasks  total selected from  the agenda  $ run.   What  would
happen  if the  Worth  facets of  those 110  concepts  were not  only
initialized  to 200, but were  held fixed at 200  for the duration of
the run?

In this  case, the  delay factor still decreased.  That is,  AM
still got more and more "back to normal" as it progressed. The reason
is  because  AM's  later  work  dealt  with  concepts  like   Primes,
Square-root, etc., which were so far removed  from the intial base of
concepts that their Worths were of little consequence.

Even  more drastically, we  could force all  the Worth  facets of all
concepts -- even newly-created  ones -- to be  kept at the value  200
forever. In this case,  AM's behavior doesn't compltely disintegrate,
but  that delay factor  actually increases with  time: apparently, AM
begins to suffer from the exponential growth of "things to do" as its
repertoire  of  concepts  grows  linearly.   Its  purposiveness,  its
directionality depends on "focus of attention" more and more, and  if
that feaure is removed,  AM loses much of its rationality.   A factor
of 5  delay doesn't sound that bad  "efficiency-wise", but the actual
apparent  behavior of  AM  is  as  stacatto  bursts  of  development,
followed  by wild  leaps  to  unrelated concepts.  AM  no longer  can
"permanently" record its interest in a certain concept.

So  we conclude that the  Worth facets are (i)  not finely tuned, yet
(ii) provide important  global information about the  relative values
of  concepts.   If  the  Worth facets  are  completely disabled,  the
rationality of AM's behavior hangs on the slender thread of "focus of
attention".

. SSSEC(How finely tuned is the Agenda?)

.RTEXSSSEC: SSSECNUM;

.COMMENT This assumes that expt number = sssec number;

.RTEXNO: SSSECNUM; 

The top few candidates  on the agenda always appear  to be reasonable
(to me).   If I work with  the system, guiding it, I  can cause it to
make a few discoveries it wouldn't otherwise make, and I can cause it
to make its typical ones much faster  (about a factor of 2). Thus the
⊗4very⊗* top task is not always the best.

If  AM randomly selects one of  the top 20 or  so tasks on the agenda
each time, what  will happen to  its behavior? Will it  disintegrate,
slow down by a factor of 10, slow down slightly,...?

This experiment required only a few seconds to set up, but demanded a
familiarity with  the  LISP  functions which  make  up  AM's  control
structure.   At  a  certain  point,  AM asks  for  Best-task(Agenda).
Typically,  the LISP function  Best-task is  defined as CAR  -- i.e.,
pick the  first member from  the list  of tasks.  What I  did was  to
redefine Best-task as  a function which randomly selected  n from the
set  {1,2,...,20},  and  then  returned  the  n⊗Ath⊗*  member of  the
job-list.

If you watch the top job on the agenda, it will  take about 10 cycles
until  AM  chooses it.  And  yet there  are  many good,  interesting,
worthwhile jobs sprinkled  among the top  20 on the  agenda, so  AM's
performance is cut by merely a factor of 3, as far as cpu time per
given major  discovery.  Part of this  better-than-20 behavior is due
to the fact  that the 18⊗Ath⊗*  best task had  a much lower  priority
rating than  the top few, hence was  allocated much less cpu  time for
its  quanum  than  the top  task  would have  received.    Whether it
succeeded or  failed, it  used up  very little  time.   Since AM  was
frequently  working on a  low-value task,  it was unwilling  to spend
much time or space on it. So  the mean time alotted per task fell  to
about 15 seconds (from the typical 30  secs). Thus, the "losers" were
dealt  with quickly,  so the  detriment  to cpu-time  performance was
softened.

Yet AM is much less rational in its sequencing of tasks. A topic will
be dropped  right in the middle,  for a dozen cycles,  then picked up
again.   Often a  "good" task will  be chosen, having  reasons all of
which were true  10 cycles ago --  and which are clearly  superior to
those  of the last  10 tasks. This  is what  is so annoying  to human
onlookers.

Conclusion: Picking (on the average) the 10th-best candidate  impedes
progress by  a factor  less  than 10  (about a  factor of  3), but  it
dramaticly   degrades  the  "sensibleness"  of   AM's  behavior,  the
continuity of  its actions.   Humans place  a big  value on  absolute
sensibleness, and believe that doing  something silly 50% of the time
is  ⊗4↓_much_↓⊗* worse  than half as  productive as  always doing the
next most logical task.

Corollary: having 20 multi-processors simultaneously execute the top
20 jobs will roughly triple  the rate of "big" discoveries.  That is,
neither a full factor of 20, nor no gain at all.

Another experiment in this same vein was done, one which was designed
to be far more  crippling to AM. Be-threshhold was held  at 0 always,
so  ⊗4any⊗*  task which  ever got  proposed  was kept  forever  on the
agenda, no matter  how low its priority.  The Best-task function  was
modified so it randomly selected any  member of the list of jobs. As
a final insult, the Worth facets of all the concepts were initialized
to 200 before starting AM.

Result: Many  "explosive" tasks were  chosen, and  the number of  new
concepts  increased rapidly.  As expected,  most of  these  were real
"losers".  There seemed no  rationality to AM's sequence of  actions,
and it  was  quite boring  it watch  it floundering  so. The  typical
length of the agenda was about 500, and AM's performance was "slowed"
by at least a couple  orders of magnitude. A more subjective  measure
of its "intelligence" would say  that it totally collapsed under this
random scheme.

Conclusion: Having 500 processors simultaneously execute all the jobs
on the agenda would increase  AM's performance less than a  factor of
10.   The truly "intelligent"  behavior AM exhibits  is its plausible
sequencing of tasks.

Let's dig  inside the  agenda scheme  now. One  of the  highly-touted
ideas in AM is the  attaching of reasons to the tasks  on the agenda,
and  using those  reasons and  their ratings  to compute  the overall
prioirty vaue  assigned to  each task.   An  experiment  was done  to
ascertain the  amount of  intelligence that  was emanating  from that
idea.

The  global  formula assigning  a  priority  value to  each  job was
modified. We let it still  be a function of the reasons for  the job,
but we "trivialized" it: the priority of a job was computed as simply
the number of reasons it has.  (normalized by multiplying by 100, and
cut-off if over 1000).

This raised the new question  of what to do if several  jobs all have
the  same  priority. I  suppose  the  answer is  to  execute  them in
stack-order (most recent  first)$$ Why? Because  (i) it sounds  right
intuitively  to me,  and  mainly because  (ii) this  is  what AM  did
anyway, with no extra modification. $.

Result: <<This experiment hasn't been performed yet!>

Conclusion:

. SSSEC(What if certain key concepts are eliminated?)


Feeling  in  a  perverse mood  one  day,  I  eliminated  the  concept
"Equality" from AM, to see what it would then do.  Equality was a key
concept, because  AM  discovered  Numbers via  the  technique  of
generalizing  the  relation "Equality"  (exact  equality  of 2  given
structures,  at  all  internal  levels).   What  would  happen  if we
eliminate this  path?  Will  AM rederive  Equality?  Will it  get  to
Cardinality via another route? Will it do some set-theoretic things?

Result: <<This expt (elim. equality) is not yet done!>

Similar experiments: eliminate various other specific concepts, add a
few specific ones, modify some.

. SSSEC(What if certain heuristics are tampered with?)

add/delete/modify certain heuristics

not done yet

. SSSEC(Can AM work in a new domainα: Plane Geometry?)

This big experiment has already been done, but needs to be written up
nicely.

The author  added a new base  of concepts to the  ones already in AM.
Included  were:   Point,   Line,   Angle,   Triangle,   Equality   of
pts/lines/angles/triangles.

Results:  fairly good  behavior.  Derived  congruence, similarity  of
triangles.  Derived the idea of timberline in several ways.

Use for Goldbach's conjecture: any angle (0-180 degrees) can be built
up (to within  1 degree) as  the sum of  two angles of prime  degrees
(<180).

.GEOEX: PAGE;